task2-Graphics for Time Series

1. Air miles

The dataset airmiles is a time series of the miles flown annually by commercial airlines in the US from 1937 to 1960.

  1. Before plotting the graph, think about what shape you would expect it to have. Plot the series and comment on the differences between what you get and your expectations.

  2. Which aspect ratio conveys the information you find in the series best?

  3. Do you think the graph looks better as a line graph (as suggested on the R help page for the dataset) or with points as well?

  4. Might plotting a transformation help you to look more closely at the early years or would zooming in be sufficient?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     412    1580    6431   10528   17532   30514

Answer

  • question a: In my thought this data is about the time year by year rise, the result with my thought has been, as the time increases, more and more people take the plane
  • question b: The picture uses the same aspect ratio to better see the trend of increasing mileage
  • question c: I think line graph is will be better because line graph reflecting an upward trend
  • question d: After log, the effect of the increase can be more clearly excluded, and the pattern of data change can be more clearly seen.

2. Beveridge Wheat Price Index

The Beveridge index of wheat prices covers almost four hundred years of Euro- pean history from 1500 to 1869 and is available in the dataset bev in tseries.

  1. Plot the series and explain why you have decided to plot it in that way.

  2. Are there any particular features in the series which stand out? How would you summarise the information in the series in words?

  3. Manyimportanthistoricaleventstookplaceoverthistimeperiod,including the Thirty Years’ War, the English Civil War, and the Napoleonic Wars. Is there any evidence of any of these having an effect on the index?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    11.0    64.0    98.0   107.9   143.8   381.0

## answer

# 3. Goals in soccer games The Bundesliga dataset was used in §11.2.

  1. Plot graphs of the rates of home and away goals per game over the seasons in the same plot. What limits do you recommend for the vertical scale?

  2. Other possibilities for studying the home and away goal rates per game include plotting the differences or ratios over time and drawing a scatterplot of one rate against another. Is there any information in these graphics that is shown better by one than the others?

  3. Can you find equivalent data for the top soccer league in your own country and are there similar patterns over the years?

## 'data.frame':    14018 obs. of  7 variables:
##  $ HomeTeam : Factor w/ 52 levels "1. FC Kaiserslautern",..: 50 27 33 18 28 4 43 37 12 3 ...
##  $ AwayTeam : Factor w/ 52 levels "1. FC Kaiserslautern",..: 12 3 24 1 31 2 17 45 43 50 ...
##  $ HomeGoals: int  3 1 1 1 1 0 1 2 3 3 ...
##  $ AwayGoals: int  2 1 1 1 4 2 1 0 3 0 ...
##  $ Round    : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ Year     : int  1963 1963 1963 1963 1963 1963 1963 1963 1963 1963 ...
##  $ Date     : POSIXt, format: "1963-08-24 17:30:00" "1963-08-24 17:30:00" ...

Answer:

  • a:A narrower vertical ratio is recommended to show clearer lines
  • b: Winning percentage of home games is higher than away games in the line chart
  • c:

4. Male and female births

Important early demographic analyses were carried out on English data from the seventeenth century. The Arbuthnot dataset in the HistData package includes data on the numbers of male and female christenings in London from 1629 to 1710.

  1. Plot the number of male christenings over time. Which features stand out?

  2. Why do you think there was a low level of christenings from around the mid-1640’s to 1660?

  3. Two low outliers stand out, in 1666, presumably because of the Great Fire of London and the plague the previous year, and in 1704. A possible explanation for the 1704 outlier is given on the R help page for the dataset. Compare the data values for 1674 and 1704 to check the explanation.

## 'data.frame':    82 obs. of  7 variables:
##  $ Year     : int  1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 ...
##  $ Males    : int  5218 4858 4422 4994 5158 5035 5106 4917 4703 5359 ...
##  $ Females  : int  4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 ...
##  $ Plague   : int  0 1317 274 8 0 1 0 10400 3082 363 ...
##  $ Mortality: int  8771 10554 8562 9535 8393 10400 10651 23359 11763 13624 ...
##  $ Ratio    : num  1.11 1.09 1.08 1.09 1.07 ...
##  $ Total    : num  9.9 9.31 8.52 9.58 10 ...

Anwser:

  • a:An overall upward trend, with some periods of sudden decline during the period
  • b:because English Civil War Impact
  • c:
    1674 and 1704 have same value

5. Goals in soccer games (again)

Consider the numbers of goals scored by each team.

  1. How would you plot the annual average goals per home game for each team in the Bundesliga over the 46 seasons in the dataset? Would you choose a single graphic or a trellis display? Only one team has been a member of the Bundesliga ever since it started, Hamburg. How do you think the time series of teams with incomplete records should be displayed?

  2. You could compare the annual home and away scoring rates of particular teams by plotting the two time series on the same display or by drawing a scatterplot of one variable against the other. Using the two teams Hamburg and Bayern Munich, comment on which display you think is better. Do the displays provide different kinds of information? # 6. Deaths by horsekick Plot separate displays for each of the 14 corps in the von Bortkiewicz dataset (VonBort in vcd).

  3. Do any of the patterns stand out as different?

  4. 11 of the 14 corps had no deaths in the first year (1875). Could this be worth looking into?

Answer: - a:I think the last name of the data frame is better to show this kind of data and have an idea that the heat map can also show this kind of data structure

## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.

6. Deaths by horsekick

Plot separate displays for each of the 14 corps in the von Bortkiewicz dataset (VonBort in vcd).

## Loading required package: grid
## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

  1. Do any of the patterns stand out as different?

  2. 11 of the 14 corps had no deaths in the first year(1875).Could this be worth looking into?

Anwser:

7. Economics data

The package ggplot2 includes a dataset of five US economic indicators recorded monthly over about 40 years, economics.

## spec_tbl_df [574 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ date    : Date[1:574], format: "1967-07-01" "1967-08-01" ...
##  $ pce     : num [1:574] 507 510 516 512 517 ...
##  $ pop     : num [1:574] 198712 198911 199113 199311 199498 ...
##  $ psavert : num [1:574] 12.6 12.6 11.9 12.9 12.8 11.8 11.7 12.3 11.7 12.3 ...
##  $ uempmed : num [1:574] 4.5 4.7 4.6 4.9 4.7 4.8 5.1 4.5 4.1 4.6 ...
##  $ unemploy: num [1:574] 2944 2945 2958 3143 3066 ...
  1. If you plot all five series in one display, is it better to standardise them all at a common value initially or to align them at their means and divide by their standard deviations? What information is shown in the two displays?
  2. Alternatively you could plot each series separately with its own scale. Do these displays provide additional information and is there any information that was shown in the displays of all series together that is not so easy to see here?

8. Australian rain

The dataset bomregions in the DAAG package includes seven regional time se- ries of annual rain in Australia and one time series averaged over the country.

  1. Can all seven regional series be plotted in one display or are individual displays more informative?
## 'data.frame':    109 obs. of  22 variables:
##  $ Year     : num  1900 1901 1902 1903 1904 ...
##  $ eastAVt  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ seAVt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ southAVt : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ swAVt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ westAVt  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ northAVt : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ mdbAVt   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ auAVt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ eastRain : num  430 500 315 694 565 ...
##  $ seRain   : num  603 511 421 628 551 ...
##  $ southRain: num  375 314 284 421 388 ...
##  $ swRain   : num  738 559 542 729 711 ...
##  $ westRain : num  400 323 363 377 418 ...
##  $ northRain: num  360 476 345 601 604 ...
##  $ mdbRain  : num  413 365 256 525 448 ...
##  $ auRain   : num  369 402 317 519 505 ...
##  $ SOI      : num  -5.55 0.992 0.458 4.933 4.35 ...
##  $ co2mlo   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ co2law   : num  296 296 296 297 297 ...
##  $ CO2      : num  296 297 297 297 298 ...
##  $ sunspot  : num  9.5 2.7 5 24.4 42 63.5 53.8 62 48.5 43.9 ...

  1. Are there any outliers in the series and do they affect the scales used ad- versely?
  1. Is there any evidence of trend in the series? Are there cyclical effects?
  1. Plot all 34 series in separate displays. Are there any common features?
## 
## Attaching package: 'dplR'
## The following object is masked from 'package:zoo':
## 
##     time<-
## Classes 'rwl' and 'data.frame':  1358 obs. of  34 variables:
##  $ CAM011: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM021: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM031: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM032: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM041: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM042: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM051: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM061: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM062: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM071: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM072: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM081: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM082: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM091: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM092: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM101: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM102: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM111: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM112: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM121: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM122: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM131: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM132: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM141: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM151: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM152: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM161: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM162: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM171: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM172: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM181: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM191: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM201: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CAM211: num  0.17 0.13 0.14 0.19 0.22 0.27 0.31 0.22 0.28 0.34 ...

  1. There are at least two series with much higher maxima than the others. Compare a display excluding these series, but still retaining the same scal- ing for all the plots, with a display where each series is plotted with its own scale. What are the advantages and disadvantages of the two approaches?

10. Intermission

Salvador Dali’s painting The Persistence of Memory is in the New York Mu- seum of Modern Art. Do you think the distorted clocks could be interpreted as alternative models of time series?

Anwesr: I think it is possible, in that multiple graphics can be combined to represent the painting in an integrated way